In contact-rich tasks, like dexterous manipulation, the hybrid nature of making and breaking contact creates challenges for model representation and control. For example, choosing and sequencing contact locations for in-hand manipulation, where there are thousands of potential hybrid modes, is not generally tractable. In this paper, we are inspired by the observation that far fewer modes are actually necessary to accomplish many tasks. Building on our prior work learning hybrid models, represented as linear complementarity systems, we find a reduced-order hybrid model requiring only a limited number of task-relevant modes. This simplified representation, in combination with model predictive control, enables real-time control yet is sufficient for achieving high performance. We demonstrate the proposed method first on synthetic hybrid systems, reducing the mode count by multiple orders of magnitude while achieving task performance loss of less than 5%. We also apply the proposed method to a three-fingered robotic hand manipulating a previously unknown object. With no prior knowledge, we achieve state-of-the-art closed-loop performance in less than five minutes of online learning.
translated by 谷歌翻译
本文调查了一类称为线性互补系统(LCSS)的分段仿射动态系统的学习或系统识别。我们提出了一种基于违规的损失,它可以使用基于梯度的方法在没有先前了解混合模式边界的情况下高效地学习LCS参数化。建议的违规行为损失包括动态预测损失和新的互补性违规损失。我们展示了这种损失制定所获得的几个属性,包括其可分性,第一和二阶衍生物的有效计算,以及其与传统预测损失的关系,严格执行互补性。我们应用基于违规的损失制定,以学习具有数万种(潜在僵硬)混合模式的LCSS。结果表明了识别分段仿射动态的最新能力,优于必须通过非平滑线性互补问题来区分的优势方法。
translated by 谷歌翻译
灵感来自近期跨越隐式学习在许多机器人任务的实证效果的程度,我们寻求了解隐式配方的理论优势,面对几乎不连续的功能,用于制造和破坏与环境中的环境接触的系统的共同特征和操纵。我们呈现并激励三种学习功能:一个明确和两个隐含。我们导出这三种方法中的每一个的泛化界限,揭示了基于预测误差损失的显式和隐式方法通常无法产生紧张的界限,与其他具有基于违规的丢失定义的其他隐含方法,这可以基本上更加强大地陡峭连续下坡。此外,我们证明这种违规的隐式损失可以紧密绑定图形距离,通常具有物理根源的数量并在输入和输出中处理噪声,而不是考虑输出噪声的预测损失。我们对违规隐性制剂的普遍性和身体相关性的洞察力与先前作品的匹配证据,并通过玩具问题验证,受到刚性联络模型的启发,并在整个理论分析中引用。
translated by 谷歌翻译
逼真的模拟环境是每个机器人工具包中必不可少的工具,其用途从计划和控制到加强学习的培训政策不等。尽管模拟在现代机器人技术中的中心地位,但几乎没有做过将机器人模拟器的性能与现实世界数据进行比较的工作,尤其是对于涉及具有高速影响事件的动态运动的场景。处理动态接触是大多数模拟的计算瓶颈,因此围绕影响和摩擦的建模和算法选择构成了流行工具之间最大的区别。在这里,我们评估了几个模拟器重现涉及影响的现实世界轨迹的能力。使用实验数据,我们确定流行模拟器Drake,Mujoco和Bullet的系统特定接触参数,分析围绕这些参数进行建模选择的效果。对于扔到桌子上的立方体的简单示例,模拟器捕获了无弹性的影响,同时未能捕获弹性影响。对于跳跃Cassie Biped Landing的较高维度,模拟器可以很好地捕获散装运动,但是精度受到真实机器人和模拟器之间许多模型差异的限制。
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
We present a dynamic path planning algorithm to navigate an amphibious rotor craft through a concave time-invariant obstacle field while attempting to minimize energy usage. We create a nonlinear quaternion state model that represents the rotor craft dynamics above and below the water. The 6 degree of freedom dynamics used within a layered architecture to generate motion paths for the vehicle to follow and the required control inputs. The rotor craft has a 3 dimensional map of its surroundings that is updated via limited range onboard sensor readings within the current medium (air or water). Path planning is done via PRM and D* Lite.
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io
translated by 谷歌翻译
The visual dimension of cities has been a fundamental subject in urban studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim, and Jacobs. Several decades later, big data and artificial intelligence (AI) are revolutionizing how people move, sense, and interact with cities. This paper reviews the literature on the appearance and function of cities to illustrate how visual information has been used to understand them. A conceptual framework, Urban Visual Intelligence, is introduced to systematically elaborate on how new image data sources and AI techniques are reshaping the way researchers perceive and measure cities, enabling the study of the physical environment and its interactions with socioeconomic environments at various scales. The paper argues that these new approaches enable researchers to revisit the classic urban theories and themes, and potentially help cities create environments that are more in line with human behaviors and aspirations in the digital age.
translated by 谷歌翻译
Logic Mill is a scalable and openly accessible software system that identifies semantically similar documents within either one domain-specific corpus or multi-domain corpora. It uses advanced Natural Language Processing (NLP) techniques to generate numerical representations of documents. Currently it leverages a large pre-trained language model to generate these document representations. The system focuses on scientific publications and patent documents and contains more than 200 million documents. It is easily accessible via a simple Application Programming Interface (API) or via a web interface. Moreover, it is continuously being updated and can be extended to text corpora from other domains. We see this system as a general-purpose tool for future research applications in the social sciences and other domains.
translated by 谷歌翻译